Complete Cross-Validation for Nearest Neighbor Classifiers

نویسندگان

  • Matthew D. Mullin
  • Rahul Sukthankar
چکیده

Cross-validation is an established technique for estimating the accuracy of a classifier and is normally performed either using a number of random test/train partitions of the data, or using kfold cross-validation. We present a technique for calculating the complete cross-validation for nearest-neighbor classifiers: i.e., averaging over all desired test/train partitions of data. This technique is applied to several common classifier variants such as K-nearest-neighbor, stratified data partitioning and arbitrary loss functions. We demonstrate, with complexity analysis and experimental timing results, that the technique can be performed in time comparable to k-fold cross-validation, though in effect it averages an exponential number of trials. We show that the results of complete cross-validation are biased equally compared to subsampling and kfold cross-validation, and there is some reduction in variance. This algorithm offers significant benefits both in terms of time and accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of EEG Signals using adaptive weighted distance nearest neighbor algorithm

ss as: Pa aud Un Abstract Electroencephalogram (EEG) signals are often used to diagnose diseases such as seizure, alzheimer, and schizophrenia. One main problem with the recorded EEG samples is that they are not equally reliable due to the artifacts at the time of recording. EEG signal classification algorithms should have a mechanism to handle this issue. It seems that using adaptive classifie...

متن کامل

Computerized Pulmonary Artery Catheter Waveform Interpretation

The pulmonary artery catheter (PAC) has been used for decades in the diagnosis and treatment of critically ill patients, but knowledge of PAC waveform interpretation remains inadequate among physicians and nurses. Inspired by the relative success of EKG interpretation programs, this study investigates the feasibility of computerized PAC waveform interpretation. Clinician-provided contextual dat...

متن کامل

Impact of the Sakoe-Chiba Band on the DTW Time Series Distance Measure for kNN Classification

For classification of time series, the simple 1-nearest neighbor (1NN) classifier in combination with an elastic distance measure such as Dynamic Time Warping (DTW) distance is considered superior in terms of classification accuracy to many other more elaborate methods, including k-nearest neighbor (kNN) with neighborhood size k > 1. In this paper we revisit this apparently peculiar relationshi...

متن کامل

Combining nearest neighbor classifiers versus cross-validation selection.

Various discriminant methods have been applied for classification of tumors based on gene expression profiles, among which the nearest neighbor (NN) method has been reported to perform relatively well. Usually cross-validation (CV) is used to select the neighbor size as well as the number of variables for the NN method. However, CV can perform poorly when there is considerable uncertainty in ch...

متن کامل

Extraction of Suitable Features for Breast Cancer Detection Using Dynamic Analysis of Thermographic Images

Introduction: Thermography is a non-invasive imaging technique that can be used to diagnose breast cancer. In this study, a method was presented for the extraction of suitable features in dynamic thermographic images of breast. The extracted features can help classify thermographic images as cancerous or healthy. Method: In this descriptive-analytical study, the images were taken from the IC/UF...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000